GH-47447: [C++] Fix replace_with_mask for null type arrays#49950
Conversation
|
|
|
Hi @raulcd , |
|
Hi @raulcd , |
pitrou
left a comment
There was a problem hiding this comment.
Thanks @AnkitAhlawat7742 for this PR.
Can you add some C++ tests in the existing file vector_replace_test.cc?
You can get some inspiration from the existing TestReplaceBoolean.
…p and updated the python test with null type
HI @pitrou , |
|
Hi @kou , |
There was a problem hiding this comment.
Pull request overview
This PR fixes a crash in the C++ replace_with_mask kernel when operating on NullType arrays (GH-47447), where a Result<int64_t> path incorrectly attempted to return Status::OK(). The fix ensures the kernel returns a valid int64_t and produces a valid output for null arrays, and adds regression tests in both C++ and Python.
Changes:
- Fix
NullTypereplace_with_maskimplementation to return a validint64_t(and set output asArrayData) instead of returningStatus::OK()from aResult<int64_t>function. - Add C++ regression tests covering
replace_with_maskonNullType. - Add Python regression test reproducing the original crash scenario.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
cpp/src/arrow/compute/kernels/vector_replace.cc |
Fixes NullType ReplaceMaskImpl to return a valid int64_t and sets output as ArrayData, preventing the invalid Result<T>(Status::OK()) construction. |
cpp/src/arrow/compute/kernels/vector_replace_test.cc |
Adds a dedicated NullType test fixture and ReplaceWithMask test cases to prevent regressions in the C++ kernel. |
python/pyarrow/tests/test_compute.py |
Adds a Python regression test for pc.replace_with_mask with pa.null() arrays to ensure the crash is fixed end-to-end. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @AlenkaF , |
|
I'll merge this in a few days if nobody objects it. |
zanmato1984
left a comment
There was a problem hiding this comment.
+1, thanks for the fix.
AlenkaF
left a comment
There was a problem hiding this comment.
The one ask I have, before this gets merged, is to move the tests somewhere else in the same file. Currently the PyArrow test is located between two connected tests (test_fill_null_array and test_fill_null_chunked_array). I suggest moving it after test_extract_regex_span which is connected to test_replace_regex.
Also another ask, maybe as a follow-up, would be to add tests for regular use of replace_with_mask kernel in PyArrow. AFAIU there is none currently!
Hi @AlenkaF , Thank you for the review comments! |
|
We have multiple approvals now. I'll merge this. |
Thank you @kou, @AlenkaF, @zanmato1984, and @pitrou for the reviews and merging this PR! I will raise a follow-up PR to add tests for regular replace_with_mask usage in PyArrow. |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 2c1293f. There weren't enough matching historic benchmark results to make a call on whether there were regressions. The full Conbench report has more details. |
Rationale for this change
replace_with_mask crashes when called with null type arrays because a code path incorrectly returns Status::OK() from a function with return type Result<int64_t>.
This PR fixes the issue by ensuring the function returns a valid int64_t value instead of Status::OK() for successful execution.
What changes are included in this PR?
Are these changes tested?
Yes, manually run the test case
Are there any user-facing changes?
No
This PR contains a "Critical Fix".
This change fixes a crash in replace_with_mask when operating on null type arrays. The crash occurs due to an invalid successful return path internally constructing Result with Status::OK().
replace_with_maskcrashes when null type inputs are used #47447